Lemmatic Machine Translation

نویسندگان

  • Stephen Soderland
  • Christopher Lim
  • Mausam
  • Bo Qin
  • Oren Etzioni
چکیده

Statistical MT is limited by reliance on large parallel corpora. We propose Lemmatic MT, a new paradigm that extendsMT to a far broader set of languages, but requires substantial manual encoding effort. We present PANLINGUAL TRANSLATOR, a prototype Lemmatic MT system with high translation adequacy on 59% to 99% of sentences (average 84%) on a sample of 6 language pairs that Google Translate (GT) handles. GT ranged from 34% to 93%, average 65%. PANLINGUAL TRANSLATOR also had high translation adequacy on 27% to 82% of sentences (average 62%) from a sample of 5 language pairs not handled by GT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Lemmatic Communication

The need for translation among the world’s thousands of natural languages makes information access and communication costly. One possible solution is lemmatic communication: A human sender encodes a message into sequences of lemmata (dictionary words), a massively multilingual lexical translation engine translates them into lemma sequences in a target language, and a human receiver interprets t...

متن کامل

PanLex and LEXTRACT: Translating all Words of all Languages of the World

PanLex is a lemmatic translation resource which combines a large number of translation dictionaries and other translingual lexical resources. It currently covers 1353 language varieties and 12M expressions, but aims to cover all languages and up to 350M expressions. This paper describes the resource and current applications of it, as well as lextract, a new effort to expand the coverage of PanL...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009